Week 05: Semantic Analysis and Type Checking

Semantic Analysis

Error Detection

  • Lexical Analysis: Detects inputs with illegal tokens.
  • Parsing: Detects inputs with ill-formed parse trees.
  • Semantic Analysis: Catches all remaining errors.
  • Example. (Cool)
    1. All identifiers are declared
    2. Types
    3. Inheritance relationships
    4. Classes defined only once
    5. ...

Scope

  • Matching identifier declarations with uses.
  • The scope of an identifier is the portion of a program in which that identifier is accessible.
  • The same identifier may refer to different things in different parts of the program.
  • Most languages have static scope.
  • A few languages are dynamically scoped.
    • A dynamically-scoped variable refers to the closest enclosing binding in the execution of the program.
  • Cool identifier bindings are introduced by
    • Class declarations (introduce class names)
    • Method definitions (introduce method names)
    • Let expressions (introduce object id’s)
    • Formal parameters (introduce object id’s)
    • Attribute definitions (introduce object id’s)
    • Case expressions (introduce object id’s)
  • Not all identifiers follow the most-closely nested rule.
    • Example. (class definitions in Cool)
      • Globally visible throughout the program.
  • Attribute names are global within the class in which they are defined.
  • Methods may also be redefined (overridden).

Symbol Table

  • Much of semantic analysis can be expressed as a recursive descent of an AST
    • Before: Push the identifiers into symbol tables (stack).
    • In the Process: Process the children of AST.
    • After: Pop the identifier up from symbol tables (stack).
    • Use a stack because there are identifier definitions with the same name.
  • Symbol Table Operations
    • add_symbol(x)
    • find_symbol(x): ask type in semantics / memory location in cgen
    • remove_symbol(x)
    • enter_scope()
    • exit_scope()
    • check_scope(x)
  • Because class can be used before being defined, how to manage class?
    • Solution: do 2 tree traversals.
      • Pass 1: Gather all class names
      • Pass 2: Do the type checking
    • Actually, semantic analysis requires multiple passes.

Type Checking

Types

  • Meaning: The notion varies in different languages.
  • Consensus
    • A set of values
    • A set of operations on those values
  • In assembly language level, there are no types. For add $r1, $r2, $r3, just add it bitwisely.
  • Same type, but not make sense.
    • Function Pointer (int) + Number (int)
  • Type system in a language specifies which operations are valid for which types.
  • The goal of type checking is to ensure that operations are used only with the correct type.
  • Three kinds of languages
    • Statically typed: All or almost all checking of types is done as part of compilation.
    • Dynamically typed: Almost all checking of types is done as part of program execution.
    • Untyped: No type checking.
  • Competing view
    • Static Typing
      • Static checking catches many programming errors at compile time.
      • Avoids overhead of runtime type checks.
    • Dynamic Typing
      • Static type systems are restrictive.
      • Rapid prototyping difficult within a static type system.
  • A lot of code is written in statically typed languages with an “escape” mechanism. Unsafe casts in C, Java.
  • People retrofit static typing to dynamically typed languages for optimization, debugging.
  • The compiler infers types for every expression.
  • Term.
    • Type Checking is the process of verifying fully typed programs.
    • Type Inference is the process of filling in missing type information.

Type Checking

  • We have seen two examples of formal notation specifying parts of a compiler.
    • Lexical Analysis: Regular expressions
    • Parsing: Context-free grammars
    • Type checking: Logical rules of inference
  • Inference Rules
    • Form: If hypothesis is true, then Conclusion is true.
    • Type checking: If $E_1$ and $E_2$ have certain types, then $E_3$ has a certain type.
    • Notation
      • And: $\wedge$
      • If-Then: $\Rightarrow$
      • $x$ has type $T$: $x:T$
      • It is provable that...: $\vdash$
  • Example.
    • If $e_1$ has type Int and $e_2$ has type Int, then $e_1+e_2$ has type Int.
    • ($e_1$ has type Int $\wedge$ $e_2$ has type Int) $\Rightarrow$ $e_1+e_2$ has type Int.
    • $(e_1:Int \wedge e_2:Int) \Rightarrow e_1+e_2:Int$
    • $Hypothesis_1 \wedge \dots \wedge Hypothesis_n \Rightarrow Conclusion$
  • Tradition inference rules are writen by $$\frac{\vdash Hypothesis \dots \vdash Hypothesis}{\vdash Conclusion}$$
  • Cool type rules: $\vdash e:T$
  • Example.
    • [Int] $$\frac{i:Constant}{\vdash i:Int}$$
    • [Add] $$\frac{\vdash e_1:Int \vdash e_2:Int}{\vdash e_1+e_2:Int}$$
    • [1+2] $$\frac{\frac{1:Constant}{\vdash 1:Int}\frac{2:Constant}{\vdash 2:Int}}{\vdash 1+2:Int}$$
  • A type system is sound if
    • When $\vdash e:T$, then $e$ evaluates to a value of type $T$.
    • We only want sound rules and the more precise one. $(\vdash i:Int) > (\vdash i:Object)$
  • Types are computed in a bottom-up pass over the AST.

Type Environment

  • Other Rules
    • [False] $\frac{}{\vdash false: Bool}$
    • [String] $\frac{s:StringConstant}{\vdash s:String}$
    • (Ignore SELF_TYPE) new T produce an object of type T
      • [New] $\frac{}{\vdash new T:T}$
    • [Not] $\frac{\vdash e: Bool}{\vdash !e:Bool}$
    • [Loop] $\frac{\frac{\vdash e_1:Bool}{\vdash e_2:T}}{\vdash while\ e_1 loop\ e_2 pool:Object}$
    • [Var] $\frac{x\ is\ a\ variable}{\vdash x: ?}$
      • We need the type rule.
  • Type Environment
    • A type environment gives types for free variables.
      • A type environment is a function from Object Identifiers to Types
      • A variable is free in a expression if it is not defined within the expression.
    • Let $O$ be a function from Object Identifiers to Types.
    • Writen by: $O\vdash e:T$
    • Add to every rules.
      • [Int] $$\frac{i:Constant}{O\vdash i:Int}$$
      • [Add] $$\frac{O\vdash e_1:Int\ O\vdash e_2:Int}{O\vdash e_1+e_2:Int}$$
      • [Var] $$\frac{O(x)=T}{O\vdash x:T}$$
      • [Let-No-Init] $$\frac{O[T_0/x]\vdash e_1:T_1}{O\vdash let\ x:T_0\ in\ e_1:T_1}$$
      • Remark
        • $O[T/x](x) = T$
        • $O[T/x](y) = O(y)$
    • Remark
      • The type environment gives types to the free identifiers in the current scope.
      • The type environment is passed down the AST from the root towards the leaves.
      • Types are computed up the AST from the leaves towards the root.

Subtyping

  • Consider let rule
    • [Let-Init] $$\frac{O\vdash e_0:T_0\\O[T_0/x]\vdash e_1:T_1}{O\vdash let\ x:T_0\leftarrow e_0\ in\ e_1:T_1}$$
  • Define a relation $\leq$ on classes
    • $X\leq X$
    • $X\leq Y$ if $X$ inherits from $Y$
    • $X\leq Z$ if $X\leq Y$ and $Y\leq Z$
      • [Let-Init] $$\frac{O\vdash e_0:T_0\\O[T_0/x]\vdash e_1:T_1\\T_0\leq T}{O\vdash let\ x:T\leftarrow e_0\ in\ e_1:T_1}$$
      • [Assign] $$\frac{O(x)=T_0\\O\vdash e_1:T_1\\T_1\leq T_0}{O\vdash x\leftarrow e_1:T_1}$$
  • Let $O_c(x)=T$ for all attributes $x:T$ in class $C$
    • [Attr-Init] $$\frac{O_c(x)=T_0\\O_c\vdash e_1:T_1\\T_1\leq T_0}{O_c\vdash x:T_0\leftarrow e_1}$$
  • How to decide the return type of if statement?
    • if $e_0$ then $e_1$ else $e_2$ fi
    • The best way we can do is the smallest supertype larger than the type of $e_1$ or $e_2$.
    • $lub(X,Y)$, the least upper bound of $X$ and $Y$, is $Z$ if
      • $X\leq Z \wedge Y \leq Z$, $Z$ is an upper bound.
      • $X\leq Z' \wedge Y \leq Z' \Rightarrow Z\leq Z'$, $Z$ is an upper bound.
    • In COOL, the least upper bound of two types is their least common ancestor in the inheritance tree.
    • [If-Then-Else] $$\frac{O\vdash e_0:Bool\\O\vdash e_1:T_1\\O\vdash e_2:T_2}{O\vdash if\ e_0\ then\ e_1\ else\ e_2\ fi:lub(T_1,T_2)}$$
    • [Case] $$\frac{O\vdash e_0:T_0\\O[T_1/x_1]\vdash e_1:T_{1`}\\\dots\\ O[T_n/x_n]\vdash e_n:T_{n'}}{O\vdash case\ e_0\ of\ x_1:T_1\rightarrow e_1;\dots ;x_n:T_n\rightarrow e_n;esac:lub(T_1,T_2)}$$

Typing Methods

  • In COOL, method and object identifiers live in different namespaces.
    • A method foo and an object foo can coexist in the same scope.
    • We use a separate mapping $M$ For method signatures $$M(C,f)=(T_1,\dots,T_n,T_{n+1})$$ means in class $C$ there is a method $f$ $$f(x_1:T_1,\dots,x_n:T_n):T_{n+1}$$
    • [Dispatch] $$\frac{O,M\vdash e_0:T_0\\ O,M\vdash e_1:T_1\\ \dots\\ O,M\vdash e_n:T_n\\ M(T_0, f)=(T_{1'},\dots,T_{n'},T_{n+1})\\ T_i\leq T_{i'},\ for\ 1\leq i\leq n}{O,M\vdash e_0.f(e_1,\dots,e_n):T_{n+1}}$$
    • [Static Dispatch] $$\frac{O,M\vdash e_0:T_0\\ O,M\vdash e_1:T_1\\ \dots\\ O,M\vdash e_n:T_n\\ T_0\leq T\\ M(T, f)=(T_{1'},\dots,T_{n'},T_{n+1})\\ T_i\leq T_{i'},\ for\ 1\leq i\leq n}{O,M\vdash e_0@T.f(e_1,\dots,e_n):T_{n+1}}$$
  • The method environment must be added to all rules.
  • In most cases, $M$ is passed down but not actually used. (Only the dispatch rules use $M$)
    • [Add] $$\frac{O,M\vdash e_1:Int\ O,M\vdash e_2:Int}{O,M\vdash e_1+e_2:Int}$$
  • To derive the type of SELF_TYPE, we need to know the class in which an expression appears.
  • The full type environment for COOL
    • A mapping $O$ giving types to object id's
    • A mapping $M$ giving types to methods
    • The current class $C$
    • [Add] $$\frac{O,M,C\vdash e_1:Int\ O,M,C\vdash e_2:Int}{O,M,C\vdash e_1+e_2:Int}$$
  • General Themes
    • Type rules are defined on the structure of expressions.
    • Types of variables are modeled by an environment.

Implementation of Typing Checking

  • COOL type checking can be implemented in a single traversal over the AST.
  • Type environment is passed down the tree from parent to child.
  • Types are passed up the tree from child to parent.
    • [Add] $$\frac{O,M,C\vdash e_1:Int\ O,M,C\vdash e_2:Int}{O,M,C\vdash e_1+e_2:Int}$$
    • TypeCheck(Environment, e1+e2) = {
        T1 = TypeCheck(Environment, e1);
        T2 = TypeCheck(Environment, e2);
        Check T1 == T2 == Int;
        return Int;
      }
      
    • [Let-Init] $$\frac{O\vdash e_0:T_0\\O[T_0/x]\vdash e_1:T_1\\T_0\leq T}{O\vdash let\ x:T\leftarrow e_0\ in\ e_1:T_1}$$
    • TypeCheck(Environment, let x:T <- e0 in e1) = {
        T0 = TypeCheck(Environment, e0);
        T1 = TypeCheck(Environment.add(x:T), e1);
        Check subtype(T0,T);
        return T1;
      }
      

In [ ]: